Institutional Repository Keyword Analysis with Web Crawler

نویسندگان

چکیده

This study aims at investigating procedures of semantic and linguistic extraction keywords from metadata documents indexed in the Institutional Repository Unesp. For that purpose, a web crawler was developed, collected 325.181 authors, all fields knowledge, February 28th, 2013 to November 10th, 2021. The preparation collection, analysis environment used Python programming language, composed three program libraries: library requests, which allows manipulation hyperlinks webpages visited through crawler; BeautifulSoup library, extract HTML data webpage analysis; Pandas has an open code (free software) stands for providing tools high performance analysis. final listing consisted 273,485 keywords, represents 15.9% initially collected. Results indicated most recurring problem duplication with 51,696 duplicated representing indicators inconsistencies search documents. It is concluded refinement assigned by authors eliminates incorporation set symbols do not represent authors’ same spelling, but upper/lower case variations or lexical indexing different

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Keyword-Focused Web Crawler

This paper concerns predicting the content of textual web documents based on features extracted from web pages that link to them. It may be applied in an intelligent, keyword-focused web crawler. The experiments made on publicly available real data obtained from Open Directory Project with the use of several classification models are promising and indicate potential usefulness of the studied ap...

متن کامل

Institutional Repository

26 Background: Distance from home to school is an important influence on the decision 27 to use active transport (AT); however, ecological perspectives would suggest this relationship 28 may be moderated by individual, interpersonal, and environmental factors. This study 29 investigates whether (i) gender, (ii) biological maturation, (iii) perceived family support for 30 physical activity (PA),...

متن کامل

Institutional Repository

The global burden of foodborne disease due to the presence of contaminating microorganisms remains high, despite some notable examples of their successful reduction in some instances. Globally, the number of species of microorganisms responsible for foodborne diseases has increased over the past decades and as a result of the continued centralization of the food processing industry, outbreaks n...

متن کامل

Institutional Culture as Keyword

Institutional culture has become a buzzword in recent discussions of higher education in South Africa. Indeed, as references to it proliferate, there is a growing sense that institutional culture may well be the key to the successful transformation of higher education in South Africa. Or – to frame the matter as forcefully as do many recent analysts – it is simply the massive fact and bulk of i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Central European journal of educational research

سال: 2022

ISSN: ['2677-0326']

DOI: https://doi.org/10.37441/cejer/2022/4/2/11395